318 research outputs found
Structure Learning in Nested Effects Models
Nested Effects Models (NEMs) are a class of graphical models introduced to
analyze the results of gene perturbation screens. NEMs explore noisy subset
relations between the high-dimensional outputs of phenotyping studies, e.g. the
effects showing in gene expression profiles or as morphological features of the
perturbed cell.
In this paper we expand the statistical basis of NEMs in four directions:
First, we derive a new formula for the likelihood function of a NEM, which
generalizes previous results for binary data. Second, we prove model
identifiability under mild assumptions. Third, we show that the new formulation
of the likelihood allows to efficiently traverse model space. Fourth, we
incorporate prior knowledge and an automated variable selection criterion to
decrease the influence of noise in the data
Starr: Simple Tiling Array Analysis of Affymetrix ChIP-chip data
Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an
assay for DNA-protein-binding or post-translational chromatin/histone
modifications. As with all high-throughput technologies, it requires a thorough
bioinformatic processing of the data for which there is no standard yet. The
primary goal is the reliable identification and localization of genomic regions
that bind a specific protein. The second step comprises comparison of binding
profiles of functionally related proteins, or of binding profiles of the same
protein in different genetic backgrounds or environmental conditions.
Ultimately, one would like to gain a mechanistic understanding of the effects
of DNA binding events on gene expression. We present a free, open-source R
package Starr that, in combination with the package Ringo, facilitates the
comparative analysis of ChIP-chip data across experiments and across different
microarray platforms. Core features are data import, quality assessment,
normalization and visualization of the data, and the detection of ChIP-enriched
genomic regions. The use of common Bioconductor classes ensures the
compatibility with other R packages. Most importantly, Starr provides methods
for integration of complementary genomics data, e.g., it enables systematic
investigation of the relation between gene expression and dna binding
Learning Monotonic Genotype-Phenotype Maps
Evolutionary escape of pathogens from the selective pressure of immune responses and from medical interventions is driven by the accumulation of mutations. We introduce a statistical model for jointly estimating the dynamics and dependencies among genetic alterations and the associated phenotypic changes. The model integrates conjunctive Bayesian networks, which define a partial order on the occurrences of genetic events, with isotonic regression. The resulting genotype-phenotype map is non-decreasing in the lattice of genotypes. It describes evolutionary escape as a directed process following a phenotypic gradient, such as a monotonic fitness landscape. We present efficient algorithms for parameter estimation and model selection. The model is validated using simulated data and applied to HIV drug resistance data. We find that the effect of many resistance mutations is non-linear and depends on the genetic background in which they occu
Starr: Simple Tiling ARRay analysis of Affymetrix ChIP-chip data
<p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an assay used for investigating DNA-protein-binding or post-translational chromatin/histone modifications. As with all high-throughput technologies, it requires thorough bioinformatic processing of the data for which there is no standard yet. The primary goal is to reliably identify and localize genomic regions that bind a specific protein. Further investigation compares binding profiles of functionally related proteins, or binding profiles of the same proteins in different genetic backgrounds or experimental conditions. Ultimately, the goal is to gain a mechanistic understanding of the effects of DNA binding events on gene expression.</p> <p>Results</p> <p>We present a free, open-source <b>R</b>/Bioconductor package <it>Starr </it>that facilitates comparative analysis of ChIP-chip data across experiments and across different microarray platforms. The package provides functions for data import, quality assessment, data visualization and exploration. <it>Starr </it>includes high-level analysis tools such as the alignment of ChIP signals along annotated features, correlation analysis of ChIP signals with complementary genomic data, peak-finding and comparative display of multiple clusters of binding profiles. It uses standard Bioconductor classes for maximum compatibility with other software. Moreover, <it>Starr </it>automatically updates microarray probe annotation files by a highly efficient remapping of microarray probe sequences to an arbitrary genome.</p> <p>Conclusion</p> <p><it>Starr </it>is an <b>R </b>package that covers the complete ChIP-chip workflow from data processing to binding pattern detection. It focuses on the high-level data analysis, e.g., it provides methods for the integration and combined statistical analysis of binding profiles and complementary functional genomics data. <it>Starr </it>enables systematic assessment of binding behaviour for groups of genes that are alingned along arbitrary genomic features.</p
Efficient Maximum Likelihood Estimation for Pedigree Data with the Sum-Product Algorithm
OBJECTIVE We analyze data sets consisting of pedigrees with age at onset of colorectal cancer (CRC) as phenotype. The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood. METHODS We propose an improved EM algorithm by applying factor graphs and the sum-product algorithm in the E-step. This reduces the computational complexity from exponential to linear in the number of family members. RESULTS Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster. CONCLUSION We introduce a flexible and runtime-efficient tool for statistical inference in biomedical event data with latent variables that opens the door for advanced analyses of pedigree data
Efficient Maximum Likelihood Estimation for Pedigree Data with the Sum-Product Algorithm
In this paper, we analyze data sets consisting of pedigrees where the response is the age at onset of colorectal cancer (CRC). The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor, as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood.
We therefore developed an EM algorithm by applying factor graphs and the sum-product algorithm in the E-step, reducing the computational complexity from exponential to linear in the number of family members.
Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster.
We introduce a flexible and runtime-efficient tool for statistical inference in biomedical event data that opens the door for advanced analyses of pedigree data
FADD and Caspase-8 Regulate Gut Homeostasis and Inflammation by Controlling MLKL- and GSDMD-Mediated Death of Intestinal Epithelial Cells
Pathways controlling intestinal epithelial cell (IEC) death regulate gut immune homeostasis and contribute to the pathogenesis of inflammatory bowel diseases. Here we show that caspase-8 and its adapter FADD act in IECs to regulate intestinal inflammation downstream of Z-DNA binding protein 1 (ZBP1)- and tumor necrosis factor receptor-1 (TNFR1)-mediated receptor interacting protein kinase 1 (RIPK1) and RIPK3 signaling. Mice with IEC-specific FADD or caspase-8 deficiency developed colitis dependent on mixed lineage kinase-like (MLKL)-mediated epithelial cell necroptosis. However, MLKL deficiency fully prevented ileitis caused by epithelial caspase-8 ablation, but only partially ameliorated ileitis in mice lacking FADD in IECs. Our genetic studies revealed that caspase-8 and gasdermin-D (GSDMD) were both required for the development of MLKL-independent ileitis in mice with epithelial FADD deficiency. Therefore, FADD prevents intestinal inflammation downstream of ZBP1 and TNFR1 by inhibiting both MLKL-induced necroptosis and caspase-8-GSDMD-dependent pyroptosis-like death of epithelial cells
Exact likelihood computation in Boolean networks with probabilistic time delays, and its application in signal network reconstruction
Motivation: For biological pathways, it is common to measure a gene expression time series after various knockdowns of genes that are putatively involved in the process of interest. These interventional time-resolved data are most suitable for the elucidation of dynamic causal relationships in signaling networks. Even with this kind of data it is still a major and largely unsolved challenge to infer the topology and interaction logic of the underlying regulatory network. Results: In this work, we present a novel model-based approach involving Boolean networks to reconstruct small to medium-sized regulatory networks. In particular, we solve the problem of exact likelihood computation in Boolean networks with probabilistic exponential time delays. Simulations demonstrate the high accuracy of our approach. We apply our method to data of Ivanova et al. (2006), where RNA interference knockdown experiments were used to build a network of the key regulatory genes governing mouse stem cell maintenance and differentiation. In contrast to previous analyses of that data set, our method can identify feedback loops and provides new insights into the interplay of some master regulators in embryonic stem cell development. Availability and implementation: The algorithm is implemented in the statistical language R. Code and documentation are available at Bioinformatics online. Contact: [email protected] or [email protected] Supplementary information: Supplementary Materials are available at Bioinfomatics onlin
- …